Indoor scene recognition is a multi-faceted and challenging problem due tothe diverse intra-class variations and the confusing inter-class similarities.This paper presents a novel approach which exploits rich mid-levelconvolutional features to categorize indoor scenes. Traditionally usedconvolutional features preserve the global spatial structure, which is adesirable property for general object recognition. However, we argue that thisstructuredness is not much helpful when we have large variations in scenelayouts, e.g., in indoor scenes. We propose to transform the structuredconvolutional activations to another highly discriminative feature space. Therepresentation in the transformed space not only incorporates thediscriminative aspects of the target dataset, but it also encodes the featuresin terms of the general object categories that are present in indoor scenes. Tothis end, we introduce a new large-scale dataset of 1300 object categorieswhich are commonly present in indoor scenes. Our proposed approach achieves asignificant performance boost over previous state of the art approaches on fivemajor scene classification datasets.
展开▼